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Abstract 

Source-controlled routing has been proposed as a way to 
improve flexibility of future network architectures, as well 
as simplifying the data plane. However, if a packet spec- 
ifies its path, this precludes fast local re-routing within 
the network. We propose SlickPackets, a novel solution 
that allows packets to slip around failures by specifying 
alternate paths in their headers, in the form of compactly- 
encoded directed acyclic graphs. We show that this can 
be accomplished with reasonably small packet headers for 
real network topologies, and results in responsiveness to 
failures that is competitive with past approaches that re- 
quire much more state within the network. Our approach 
thus enables fast failure response while preserving the 
benefits of source-controlled routing. 

1. INTRODUCTION 

Traditional routing protocols are network-controlled: 

routes are computed within the network, with each router 
picking, from among its neighbors, the next-hop to each 
destination. Examples include BGP for interdomain rout- 
ing, and OSPF for intradomain routing. An alternate 
paradigm, source-controlled routing (SCR), improves 
the flexibility of the network architecture. Rather than 
computing all routes within the network, SCR architec- 
tures [10 , 20 , 28 - 30] reserve some choice of routes for the 
soitrcejjto select on a per-packet basis. The uses of SCR's 
routing flexibility are quite diverse. Sources can observe 
end-to-end reliability problems and switch to a working 
path within a few round-trip times (RTTs); pick better- 
performing routes based on observed performance [5l l 111 
124] ; improve load balance since path selection is finer- 
grained [23]; encourage competition among network pro- 
viders [7J; improve security [27] ; or optimize for other 
application-specific objectives. SCR is thus a promising 
approach to improve the flexibility of the network layer 
in future Internet architectures. 

However, one remaining problem is that of fast failure 
reaction. This problem arose in early network-controlled 
routing (NCR) protocols, which suffered from unrelia- 
bility during network dynamics: during the distributed 

This is an extended version of a paper that appeared in 
ACM SIGMETRICS 2011. Supporting code is available 
at http : //code . google . com/p/slick-packets/ 
1 In this paper, we use "source" to refer either to end- hosts 
or to edge routers acting on their behalf. 



convergence process, packets could enter "black holes" or 
loops, resulting in tens of seconds or minutes of downtime 
in Internet end-to-end paths [14f26] . Treating these basic 
protocols as a baseline, two high-level approaches have 
been proposed to improve failure reaction. 

The first approach works within the NCR paradigm 
by computing an alternate path to each destination (or 
IP prefix or AS); a router can locally switch to the alter- 
nate path without waiting for a control-plane convergence 
process. Packets can thus be delivered continuously, ex- 
cept for the minimal time it takes for a router to detect 
failure of one of its directly connected links and locally 
switch to an alternate path. Examples include MPLS 
Fast Reroute [22], SafeGuard [TS], and FCP Q7| for in- 
tradomain routing, and R-BGP [16] for interdomain rout- 
ing. However, this approach lacks the routing flexibility 
of SCR. 

A second approach to improve failure reaction is to 
leverage SCR's routing flexibility: a source can switch 
routes without waiting for the Internet's control plane to 
reconverge. While this improves failure reaction time rel- 
ative to the baseline above, the source still must wait to 
receive notice of the failure. Regardless of the means of 
notification, this will take at least on the order of one 
RTT, which at Internet scales would be much slower than 
the first approach of using NCR with alternate paths. 
And in the SCR proposals that provide the most flexibil- 
ity [10129] , sources specify in the packet header an explicit 
route (perhaps at the level of autonomous systems) rather 
than a destination, so the NCR and SCR techniques can- 
not be immediately combined. 

The goal of this paper is to achieve the best of two 
worlds: the fast failure reaction of alternate routes em- 
bedded within the network, and the flexibility of routes 
chosen by sources at the edge of the network. To meet 
this goal, we work within the SCR paradigm, but with a 
twist. Instead of specifying a single path to the destina- 
tion, the packet header contains a directed acyclic graph 
that we call the forwarding subgraph (FS). Each router 
along the packet's path may choose to forward it along 
any of the outgoing links at that router's node in the 
FS (optionally preferring a path marked as the primary) , 
with no danger of causing a forwarding loop. This ap- 
proach, which we call SlickPackets, allows packets to 
"slip" around failures in-flight while retaining the flexibil- 
ity of source route control. Moreover, SlickPackets pro- 



vides a scalability benefit over NCR with alternate paths: 
rather than requiring multiple routes to every destination 
in every router's forwarding table, SlickPackets routers 
need only local information. 

Of course, our approach also presents several challenges. 
Chief among these is how to encode an FS with suffi- 
cient path diversity into the small space afforded by a 
packet header. We introduce techniques through which 
the FS can be encoded compactly enough for our mech- 
anism to be feasible. For example, an FS providing an 
alternate path at every hop along the primary occupies 
less than 26 bytes for 99% of evaluated source-destination 
pairs in an AS-level Internet map, and no higher than 50 
bytes in all evaluated cases. Thus, the technique incurs 
manageable overhead for applications that send packets 
of moderate to large size. We also demonstrate through a 
simulation-based performance evaluation that SlickPac- 
kets achieves failure reaction performance that is com- 
parable to the best of NCR architectures |18| . 

The rest of this paper proceeds as follows. In Sj2] we 
present an overview of SlickPackets and its principal 
design challenges. Ogives a detailed presentation of the 
SlickPackets design. We evaluate the performance of 
our design in terms of header size and failure reaction in 
S|U We discuss extensions of SlickPackets in Sj5] and 
related work in Sj6l and conclude in iJ7] 

2. OVERVIEW 

In this section, we provide an overview of SlickPac- 
kets, and discuss several critical design challenges. 

SlickPackets is a failure reaction mechanism for SCR 
protocols. In contrast to traditional SCR protocols that 
specify a single path in the packet header, SlickPac- 
kets enables fast recovery within the network by allowing 
the source to embed the rerouting information within the 
packet header in the form of a forwarding subgraph 
(FS). The FS specifies a set of paths that intermediate 
routers can use to reroute packets in case of failures. The 
source, if it desires, can designate one of these paths as 
the primary path to be used in the absence of failure; 
the rest of the paths are then treated as alternate paths 
that can be used if the primary path is not available. In 
order to avoid forwarding loops, SlickPackets requires 
that the FS be a directed acyclic graph (DAG). 

Performing forwarding in this way has two main ben- 
efits. First, since the source specifies the FS, it has full 
control of not only the primary path, but also how the 
network forwards the packet when the primary path is 
not available. Second, since alternate path information 
is embedded directly in the packet header, the network 
can react immediately without requiring involvement of 
the source, which reduces the reaction time in presence of 
link failures. In addition to these two benefits, the task 
of a router becomes simpler: a router requires only local 
knowledge of its neighbors, rather than needing an alter- 
nate path for every destination (which may require infor- 
mation such as the multi- homing locations of each host). 
In summary, SlickPackets achieves key benefits of SCR 
architectures (flexibility in route selection and scalability 
of network routing state) while simultaneously attaining 
failure reaction performance that is comparable to that of 
NCR architectures with backup paths. 
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Figure 1: Overview of the SlickPackets design. 
Step I: the source selects a forwarding subgraph 
(FS) based on the topology of the physical net- 
work; Step II: the source encodes and embeds the 
FS in the packet header to inform routers how to 
route around encountered failures; and Step III: 
routers forward the packet based on the FS con- 
tained in the packet header. 



Fig. [T] shows an example to illustrate the design of 
SlickPackets. Suppose the source s wishes to send a 
packet to a destination d. The source has acquired, by 
some mechanism to be discussed later, a map of the net- 
work. It selects the FS as shown in Fig. [T] and designates 
(s,_Ri,i?2, Rs,d) to be the primary path. Note that the 
FS provides each node on the primary path with suffi- 
cient alternatives so that if a link on the primary path 
fails, the packet can be rerouted to the destination. Next, 
s constructs a data packet with the subgraph embedded 
in the packet header, and forwards it on to the first-hop 
Ri. At Ri, the packet is forwarded to the next-hop on 
the primary path (i?2). Now suppose that at R2 the pri- 
mary path's next hop R5 lies across a failed link (R2, R5); 
then R2 forwards the packet to R4, the next-hop on its 
alternate path in the FS, after which the packet continues 
to R5 and finally d. 

Realizing the high level idea of source-controlled rout- 
ing along an FS, however, involves several key challenges. 
We outline these challenges and our solutions here. 
Obtaining the map. Like other SCR architectures in 
which sources construct end-to-end paths, our sources re- 
quire a map of the available links. When deploying Slick- 
Packets as an interdomain routing protocol, this imme- 
diately raises questions of scalability and policy compli- 
ance. Is it feasible to push a map of the Internet, at some 
level of granularity, to every source or at least every edge 
router? Is there an acceptable way to balance control of 
network resources between the senders and the network 
owners? Fortunately, we can adopt the solutions devel- 
oped by past work, in particular NIRA [29] and path- 
let routing 10!, which have shown how maps of policy- 
compliant transit service can be constructed and dissem- 



inated in ways that can be much more scalable than tra- 
ditional NCR protocols like BGP. 

Packet header overhead. The next challenge is to 
design an efficient encoding mechanism that embeds the 
FS into the packet header with minimal overhead. By 
using link labels with only local significance and allocat- 
ing every bit carefully, we are able to achieve acceptable 
packet header sizes on realistic network topologies. 
Fast data-plane operations. Another challenge is to 
design an efficient data plane forwarding algorithm: the 
encoding and forwarding mechanisms in SlickPackets 
should minimize next-hop lookup time without substan- 
tially increasing header processing cost, forwarding de- 
lay and/or design complexity of modern router forward- 
ing planes. Fortunately, forwarding along an FS requires 
only lookup and pointer-increment operations, as in stan- 
dard SCR protocols, and can be efficiently implemented 
in practice. 

The next section discusses our design in more detail, 
including our solutions to these challenges. 

3. SLICKPACKETS DESIGN 

In this section, we present in more detail the four main 
components of SlickPackets : definition and dissemina- 
tion of the network map f i)3.1[) : selection of a forwarding 
subgraph (FS) at the source f N3.2p ; encoding of the FS 
into the packet header ( i)3.3[) : and the data plane for- 
warding mechanism at routers ( H3.4[l . 

The SlickPackets approach could be applied in multi- 
ple contexts. We describe here how the design can be ap- 
plied to interdomain and intradomain routing. The differ- 
ences principally lie in map dissemination and data plane 
forwarding, with the core approach taking the same form 
in both contexts. 

3.1 Map format and dissemination 

As in other SCR protocols in which the source composes 
end-to-end paths [10l[29], in SlickPackets, the source 
must obtain a network "map" (topology) from which it 
can construct paths. This map is an abstract directed 
graph in which each directed directed link (u, v) at node 
node u is annotated with a label. The label is a compact, 
variable-length bitstring, which the source will use when 
encoding the FS (§ 13. 3p to tell node u that it wants u to 
use the link (u,v). Similar to an MPLS label, the label 
identifies a link only locally at u, not globally. Thus, u 
will generally announce labels of length [~log 2 6(u)~\ bits 
where S(u) is the degree of u. 

What this map corresponds to in the physical network 
and how the map is disseminated depend on the deploy- 
ment scenario. In an intradomain environment, the map 
would correspond to the physical topology of routers and 
links and could be distributed via a protocol like OSPF 
or through a centralized coordinator as in [17] . 

In an interdomain environment, we have to deal with 
the significant challenges of scalability and network own- 
ers' transit policies. In order to overcome these challenges, 
we build on solutions developed in past work and briefly 
describe them here for completeness. 
Basic approach. Both NIRA [29] and pathlet rout- 
ing [TO] provide sources with a policy-compliant map of 
the Internet, roughly at the autonomous system (AS) 



level. NIRA's map assumes common customer-provider- 
peer relationships between ASes and allows a subset of 
valley-free routes: that is, packets travel up a chain of 
providers, potentially across a peering link, and down 
a chain of customers to the destination. Pathlet rout- 
ing represents this map explicitly as an arbitrary vir- 
tual topology, whose edges (pathlets) represent policy- 
compliant transit service. 

Scalability. NIRA, while dependent on the existence of 
a typical AS business hierarchy, offers the opportunity of 
vastly improving BGP's control plane scalability. Rather 
than learning an Internet-wide topology, each node learns 
its "up-graph" of routes through providers, stopping at the 
"core" of the Internet. The up-graph requires fewer than 
20 entries for 90% of domains ;J29 , many orders of mag- 
nitude less than the roughly 300,000 prefixes that BGP 
propagates today. Each destination stores its up-graph 
in a global DNS-like database; to route to a destination, 
a source queries the database and combines its own up- 
graph with the destination's up-graph. Though the re- 
sulting map is a small fraction of the Internet, it includes 
all policy-compliant (valley-free) routes. Pathlet routing 
could use a NIRA-style approach for disseminating the 
pathlet topology, or it can be disseminated via a BGP- 
like mechanism with slightly more messaging and control 
state (< 1.7x) than traditional BGP. 

SlickPackets can take advantage of either the pathlet 
or NIRA approach for interdomain map dissemination. 
Thus, SlickPackets does not require a source to have 
complete topological knowledge of the network, but rather 
only enough to construct a path and alternate paths to 
the destination. 

We also note that SlickPackets, like other SCR and 
multipath routing architectures, can benefit from signifi- 
cantly reduced rate of control plane updates [6] compared 
with basic single-path NCR architectures. This is because 
short-lived failures need not be disseminated through the 
control plane, since failure reaction will happen anyway 
via forwarding along alternate paths without waiting for 
control-plane updates. 

Link labels. Along with the map itself, SlickPackets 
requires labels on the links. Routers (or ASes for interdo- 
main; for convenience we'll use "routers" in what follows) 
can piggyback this information with the link advertise- 
ments 10 . To change a label, a router readvertises the 
link. While readvertisements increase control traffic, we 
expect that changing a router's link labels will be fairly 
rare, for two reasons. First, the operator could change 
a single label from one bit sequence to another; however, 
there should be little need for such changes because the la- 
bels are arbitrary identifiers with no significance. Second, 
the operator may need to increase the number of links ex- 
iting the router. This may increase the label length and 
require readvertisements of all of the router's link labels, 
creating a period of inconsistency from when the router 
changes its label length to when sources receive the up- 
dated announcement. However, label lengths change only 
once every time the number of outgoing links doubles (or 
halves) in size, which is expected to be a very rare event. 

An alternate approach is to make labels self-describing: 
their first few bits encode the label length [ID]. This 
avoids the need to readvertise links after a length change 



and the resulting inconsistency, but labels become slightly 
longer. Since compactness is important for SlickPac- 
kets, we do not evaluate this approach in this paper. 
Map consistency. A natural question is whether all 
sources and the network must have an entirely consistent 
view of the map at all times. Fortunately, this difficult 
task is unnecessary. There are three possible types of 
inconsistency. 

First, if a source uses a non-existent label (e.g., the link 
has been removed or its label changed) , this is equivalent 
to a link failure and the packet can be re-routed along 
an alternate path. To avoid even this minor disturbance, 
routers can insert a short delay between announcing a 
label deletion and its removal from forwarding tables. 

Second, if a source uses a label that has changed to 
identify a different link, then the packet will follow an 
incorrect path and will be unlikely to reach its intended 
destination. This is similar to inconsistency problems in 
basic NCR protocols. (Unlike in basic NCR protocols, 
however, the packet cannot get into a loop of any signifi- 
cant length because one link in the DAG will be consumed 
at each hop.) To avoid label-change inconsistency, routers 
can simply use new labels rather than reusing ones that 
have recently had a different meaning. 

Third, a source might be unaware of some valid labels. 
This simply results in a slightly restricted set of options 
until it receives the relevant control plane advertisement, 
as in essentially any other distributed routing protocol. 

Thus, in all cases, inconsistency issues can be mitigated. 

3.2 Selection of the forwarding subgraph 

Once a source has obtained the network map, it se- 
lects a forwarding subgraph (FS) along which it desires 
the packet to be routed in the network. The FS is a 
DAG corresponding to a subset of nodes and links in the 
network map. The directed edges inform routers of the 
packet's allowed next-hops, and acyclicity ensures there 
are no forwarding loops. Additionally, for each node in 
the FS, the source may mark one outgoing link as the 
preferred primary. 

Sources have a great deal of flexibility in how they 
choose an FS. For instance, the source may select an FS 
that avoids any single link failure along a low-latency pri- 
mary path, avoids node failures, optimizes for other met- 
rics like bandwidth, or picks alternate paths that avoid 
shared risk link groups. We discuss some of these uses 
in <JS] For concreteness, we describe here and evaluate 
in <J4] how the source can pick an FS that will minimize 
primary-path latency and provide alternate paths to avoid 
any single link failure. As noted below, accommodating 
shared risk link groups is similar. 

A source s, for a given destination d, constructs a single- 
failure-avoiding FS as follows. First, s computes a pri- 
mary path P to d by running a shortest path algorithm 
over the network map. Next, s visits each link along P, 
and computes the alternate path Pi it would prefer the 
packet to be routed along if that link were to fail. In 
particular, for each node Vi on the primary path, we (a) 
remove Vi's outgoing edge corresponding to its next hop 
along the primary path; (b) compute a shortest path from 
Vi to d, not using the removed outgoing edge; and (c) re- 
store the removed edge. In case of a node having multiple 
shortest paths to the destination, the source may arbitrar- 




(a) Network map (b) Forwarding subgraph 

Figure 2: An FS may have multiple representa- 
tions of a network map node, to allow "backtrack- 
ing" without introducing cycles in the FS. 



ily select one of these shortest paths. Finally, the primary 
and the alternate paths are assembled into the FS. Note 
that the above algorithm requires \P\ runs of Dijkstra's 
algorithm. Surprisingly, it is possible to construct a pri- 
mary path and all the alternate paths in a single run of a 
shortest-path algorithm; see [T2] . 

Beyond single- link-failure protection, a source may want 
to protect against failures of shared risk link groups (i.e., 
sets of links that are likely to have correlated failures, 
such as multiple logical links allocated to a single physi- 
cal fiber). Assuming it has knowledge of these groups, it 
can do this by removing all links in the group in substep 
(a) above, and restoring them all in (c). 

Note that there is a subtlety in how the the primary 
and the alternate paths are "assembled" into the FS: if 
we simply take the union of all these links and edges, we 
might create a loop, violating the acyclicity requirement. 
Consider the network map in Fig. [5Ja). Assume that s 
desires to use (s, Ri, R3, R4, d) as the primary path. Then 
to escape a failure of the link (R3, Ra), a packet located at 
R3 must follow the path (R3, Ri, R2, R4, d). Taking the 
union of these primary and alternate paths would result 
in a loop Ri — > R3 — > Ri. Due to symmetry, the problem 
persists if (s, Ri, R2, Ri, d) is the primary path. 

In order to avoid such loops, when adding an alternate 
path edge (11, v) to the FS, we first check to see if this 
would cause a loop. If so, we create a second FS rep- 
resentation v' of the physical node v, and add the edge 
(u,v'). This can be seen as "tunneling" the packet back 
along an alternate path. In the example of Fig. before 
adding the second alternate path, we create a new copy 
R'i corresponding to the node R\. The alternate path 
then follows (R3, R[, R2, R4,, d), resulting in a acyclic rep- 
resentation of the FS as shown in Fig. [2jb) . 

3.3 Encoding the forwarding subgraph 

After choosing an FS, the source must encode the FS 
into a sequence of bits and place it in the packet header. 
SlickPackets is agnostic to the particular location this 
header appears in the packet (for example, it may reside 
in a "shim" header between the IP and MAC layers, in an 
IP option, or in a novel header format in a next-generation 
Internet protocol). There are two key goals in designing 
an encoding format: (a) minimizing the size of the re- 
sulting encoding; and (b) ensuring data plane forwarding 
operations are simple. We designed and evaluated several 
encoding formats to achieve these goals. 

In this paper we present two encoding formats, called 
Direct and Default. Each may result in a smaller encoding 
in certain scenarios as discussed below. But the latter re- 



Si 



S-2 



(code) 



(length) 



df 



Figure 3: Default encoding format layout. Si is 
the segment corresponding to node Vi on the pri- 
mary path. It encodes the node's primary next 
hop p and alternate path (di, d,2, ■ ■ ■ , de). (length) 
specifies the bit-length of the alternate path, and 
(code) specifies the bit-length of the (length) field. 



suited in smaller encoding sizes in the network topologies 
we evaluated using the single-failure-avoiding FS selection 
( H3.2p . so it is our default. 

Direct format. The Direct format encodes the FS di- 
rectly, in the sense that the FS's DAG data structure in 
memory is essentially directly serialized into a DAG data 
structure in the packet header. The header contains a 
sequence of node representations, each containing one or 
more outgoing link representations; each link representa- 
tion contains its corresponding label and a pointer to an- 
other node within the header, corresponding to the node 
at the other end of the link. We describe the bit-layout 
of this format in detail in Appendix [Bl 
Default format. One source of overhead in the Direct 
format is the use of pointers within the header. Our De- 
fault format avoids some of that overhead, by grouping 
together sequences of labels corresponding to alternate 
paths, without needing an explicit representation of each 
node along the alternate path. The disadvantage of this 
grouping is that it involves duplicating link representa- 
tions, similar to how a depth-first traversal of all paths in 
the DAG could visit links multiple times. 

In fact, there exist DAGs that have exponentially large 
numbers of possible traversals (thus specifying exponen- 
tially large numbers of ways the packet could be forwarded 
through the network). Consequently, the Direct format 
can be exponentially more efficient than the approach 
of Default in the most extreme case. In general, we ex- 
pect Direct will be more compact for situations in which 
the alternate paths often share nodes with one another 
or with the primary. However, in this paper we focus 
on the particular application of choosing single-failurc- 
avoiding FSes. For that application, we found that the 
savings from avoiding pointers outweighed the duplica- 
tion of link representations, so that Default was some- 
what more compact in several realistic networks (2]). We 
therefore choose the Default format as our default and 
describe it in more detail now. 

In the Default format, the FS is represented as a se- 
quence of segments, one for each router on the primary 
path. For instance, in Fig. [3l the primary path consists of 
k hops and Si , S2 , ■ ■ ■ , Sk are the segments corresponding 
to those k hops. The segment corresponding to a router v 
on the primary path contains three pieces of information 
(see Fig. [3J: (a) v's next-hop on the primary path; (b) 
the bit-length of the encoding of v's alternate path; and 
(c) v's alternate path, as a sequence of next-hop labels. 
By "«'s alternate path" we mean the alternate path begin- 



ning at v that avoids the primary next-hop from v. (We 
assume here that the FS has the format of one alternate 
path for each link on the primaryQ) 

For (a), we need to include the router's label f i)3.1[) for 
the given outgoing edge, and similarly for (c) we include 
a sequence of labels. Recall that these labels are only 
locally unique to each node, which is critical to achiev- 
ing a compact encoding, because the average number of 
neighbors of a router in a real-world network is typically 
vastly smaller than the total number of routers in the net- 
work [3lll3| . By exploiting the structure of the real- world 
graphs, we are able to reduce the size of the encoding 
significantly compared with globally-unique labels. 

For (b), we use the two fields: (code) and (length). 
Here, (length) specifies the total bit-lengths of all the la- 
bels di , . . . , d( of the alternate path. Based on our evalua- 
tion, alternate paths are shorter than 32 bits in most cases 
and always shorter than 128 bits; in cases a node has no 
alternate path, the alternate path bit-length is 0. Thus, 
for greater compactness, we make the bit-length of the 
(length) field be variable and store it in the (code) field 
using a prefix-free code, with the (code) bit sequences 0, 
10, and 110 mapping to values of 5, 7, and 0, respectively. 

The header contains two additional pieces of informa- 
tion. First, the SlickPackets header begins with a two 
byte field, specifying its header length. Second, a one- 
bit field on-alternate? specifies whether the packet is 
traversing along the primary path or an alternate path, 
and is initially false. We discuss next how routers use this 
information to forward packets. 

3.4 Forwarding 

We now describe the forwarding mechanism used by 
SlickPackets routers for the Default format. The input 
to this mechanism is the SlickPackets header described 
in H3.31 and the output is the interface out which the 
packet will be forwarded. 

Upon receiving a packet, the router first checks the 
value of the SlickPackets header length. If this is 0, 
this router is the destination for the packet. If not, the 
router checks the on-alternate? bit to see whether it is 
on the primary path or on an alternate path. We describe 
the forwarding operations for the two cases separately. 
Router on the primary path. The router reads the 
first segment in the header, which corresponds to itself, 
and inspects the primary next-hop label p. If the corre- 
sponding link available, the router deletes this first seg- 
ment corresponding to itself. It also updates the header 
length by subtracting the length of its segment. The 
packet is then forwarded to the next-hop on the primary 
path with the new header. 

If the primary next-hop link is not available, and the 
alternate path length is 0, the packet is dropped. Other- 
wise, the router reads its next-hop label d\ on the alter- 
nate path. If the link corresponding to di is not available, 
the packet is droppedQ If the link is available, the router 
removes all segments in the header, replacing them by 



2 While the Default format could be generalized to have 
multiple alternates at a router, or segments within seg- 
ments to provide alternates for routers along an alternate 
path, we do not explore that generalization here; in any 
case, such applications can use the Direct format. 
3 Or any other failure reaction mechanism can be applied. 



its remaining alternate path labels (efo, • • • , de). It also 
updates the header length appropriately and sets the ON- 
ALTERNATE? bit. The packet is then forwarded to the 
next- hop via label di. 

Router on an alternate path. The router reads its 
next-hop label. If the corresponding link is not available, 
the packet is dropped (or, as earlier, some other failure 
reaction mechanism is employed). If the link is available, 
the router deletes its label from the header, updates the 
header length, and forwards the packet to the next-hop. 
Simplifying forwarding operations. The above de- 
scription involved removing a prefix of the header, and 
in the case of moving to an alternate path, a suffix as 
well. In some data plane implementations, these opera- 
tions may be costly. In this case, we can simply add start 
and end pointers at the front of the header, indicating the 
extent of the remaining header. In an extra 3 bytes, we 
can fit two pointers that can point to individual bits in a 
512-byte header (which is far larger than we need). 
Interdomain vs. intradomain issues. In an intrado- 
main deployment, we may assume that each router runs 
SlickPackets and forwards packets as described above. 
However, in an interdomain deployment the forwarding 
subgraph roughly represents AS-level paths (as discussed 
in more detail in 13. When the packet is forwarded 
though an intermediate domain, that domain must for- 
ward the packet on to the next AS-level hop. Network 
operators may independently choose from a variety of 
ways to do this, for example by tunneling the packet with 
MPLS, or perhaps running SlickPackets internally as 
well as interdomain. 

4. EVALUATION 

SlickPackets advocates the idea of embedding a for- 
warding subgraph (FS) in the packet header, giving routers 
multiple forwarding options in order to provide the source 
with some property that it desires. While SlickPackets 
can support flexible FS selections that provide different 
guarantees, for concreteness, this section evaluates the FS 
selection exemplified in 33.21 which targets fast reaction in 
the presence of single-link failures. The source constructs 
a DAG comprised of the shortest primary path, and the 
shortest alternate path for each node on the primary path 
in case that node's outgoing link along the primary path 
fails. In terms of performance, three metrics are impor- 
tant: (a) encoding size, (b) failure reaction effectiveness, 
and (c) router complexity and packet forwarding rates. 
We present results for (a) and (b) in this section and dis- 
cuss (c) in [J7| 

Topologies. We use three network topologies in our 
evaluation: the latency-annotated topology from Sprint 
ISP 1239 0, with 315 nodes and 972 links; an AS-level 
map of the Internet [T3], with 33,508 nodes and 75,001 
links; and the largest component, with 190,914 nodes and 
607,610 links, of a router-level map of the Internet pp. 
The latter two topologies lack latency information; we 
take all links to have equal length. While using Slick- 
Packets directly on a router-level map of the Internet 
is not a likely deployment scenario (due to privacy and 
scaling issues, ASes do not propagate internal topologies 
globally in today's Internet), we consider this extreme 
design point to investigate scaling issues of our design. 



4.1 Encoding size 

Since we encode the FS into the packet header, the 
encoding size determines the bandwidth overhead. We 
evaluate the resulting encoding sizes of the Direct and 
Default encoding formats presented in 33.31 for FSes con- 
structed using the algorithm presented in 33.21 

Furthermore, regardless of the encoding format used, 
the FS size — the number of edges — is a factor influenc- 
ing the encoding size. We are thus also interested in com- 
paring the sizes of FSes constructed by the algorithm de- 
scribed in 33.21 to lower bounds on the sizes of FSes 
returned by any algorithm that provides shortest path 
latencies and single-link failure protection. These lower 
bounds impose a fundamental limit on the encoding size; 
intuitively, for a given encoding format that already uses 
optimized label lengths, it is hard to reduce the encod- 
ing size significantly without reducing the FS size. We 
describe in Appendix \D\ an algorithm that yields a lower 
bound on the size of the FS for a given primary path 
hopcount. 

Methodology. We evaluate all 98,910 possible ordered 
source-destination pairs of the Sprint topology. For the 
AS- and router-level topologies, we randomly sample ten 
million unique ordered source-destination pairs. For each 
pair, we record these values: the Default and Direct en- 
coding sizes, the size of the FS constructed using our al- 
gorithm, and the lower bound on FS sizes. 
Results. Fig. fj] shows the encoding size results. We see 
that Default has somewhat smaller size almost always; 
Direct performs noticeably better only in the extreme tail 
of the router-level topology. We therefore discuss Default 
in what follows. For the intradomain Sprint topology, the 
maximum encoding size is 58 bytes. The plot has a long 
tail with 90% and 99% of the source-destination pairs 
requiring less than 21 bytes and 34 bytes of encoding, 
respectively. For the interdomain AS-level map of the 
Internet, the maximum encoding size is 50 bytes. As with 
the Sprint topology, the plot has a long tail, with 90% of 
the source-destination pairs resulting in encodings of less 
than 21 bytes; 99% of the source-destination pairs result 
in less than 26 bytes. 

For the extreme case of router-level topology, 90% of the 
source-destination pairs result in encodings of less than 43 
bytes; 99% less than 60 bytes. The remaining less than 
1% of the source-destination pairs constitute the long tail, 
with maximum encoding size of 132 bytes. Although the 
router-level realization of SlickPackets may be imprac- 
tical, the above results demonstrate that SlickPackets 
can scale on graphs as large as 200,000 nodes with mod- 
erate increase in the packet header sizes. If desired, this 
overhead may be amortized over more data (e.g., by lever- 
aging IPv6 jumbo frames) or using SlickPackets only 
for application data that is most sensitive to failures. 

Fig. shows the FS size (in number of links) and lower 
bound. For the AS-level and router-level topologies, our 
FS size is very close to the lower bound; for the Sprint 
topology, the difference is somewhat larger. Overall, the 
results suggest that, for handling single-link failures, our 
simple FS selection algorithm is relatively close to optimal 
in terms of minimizing the number of links in the FS. 

For the Sprint topology, there is also a long tail in both 
our FS sizes and the lower bounds. The reason is that 




Encoding size (bytes) Encoding size (bytes) Encoding size (bytes) 



(a) Sprint Topology (b) AS-level Topology (c) Router-level Topology 



Figure 4: CDF of SlickPackets encoding size in bytes for the Direct and Default encoding formats, for 
handling single-link failures. 




Figure 5: CDF of SlickPackets FS size and the lower bound in number of edges for handling single-link 
failures. 



there are a few source-destination pairs that have long 
primary paths, requiring alternate paths for a large num- 
ber of nodes, resulting in larger number of edges. 

4.2 Failure reaction effectiveness 

One metric to evaluate the effectiveness of a failure re- 
action mechanism is the packet stretch, the ratio of the 
length of a packet's path to the length of the shortest 
possible path. Previous works calculate stretch based on 
packets' traversed path costs or transit times. However, 
for a delay-sensitive application, we are interested in the 
time a packet is live from the application's perspective — 
from the time the packet is generated by the source appli- 
cation to the time it is received by the destination. Thus, 
we define the stretch for a packet that does not fully tra- 
verse the original shortest path, to be the ratio of the time 
the packet is live to the post-link-failure shortest path la- 
tency; for other packets — those that traverse the original 
shortest path — the stretch is 1. For brevity of the ensuing 
discussion, lo denotes the failed link on the primary path 
from source s to destination d; ro denotes the router that 
is adjacent to and upstream from lo on the primary path; 
and to denotes the time of failure of lo. 
Modeling delay at network devices. A router in the 
network, upon a link failure, has to perform a number of 
tasks before it has new valid default next hops for affected 
destinations. The four major tasks are: (1) detecting a 
failed link (if the router is adjacent to the failed link) and 
generating a control plane message; (2) processing of re- 
ceived control packets; (3) computing the new shortest 
path tree (SPT); and (4) updating the forwarding infor- 
mation base (FIB). We assume that the delay in detecting 



a failed link is zero since irrespective of the underlying 
routing architecture, all packets during this period are 
lostQ this does not make a difference in our performance 
comparison results. We consider the three other major 
contributors. 

Let d r be the time spent by a router in processing 
a control packet (i.e., the time between the router's re- 
ceipt and forwarding of the packet). d r (along with link 
latencies) dictates the propagation rate of control pack- 
ets through the network. Let d v be the delay between a 
router's learning of the link failure and starting a new SPT 
computation; d c be the time taken to compute the new 
SPT; and d u be the time taken to update the FIB. Note 
that, upon receiving a control packet, a router necessarily 
spends D — (d p + d c + d u ) time before having new valid 
default next hops for affected destinations. The values of 
d c and d u depend on the router architecture, algorithms 
in use, the topology, and the router's location. Lacking a 
good model, we set these values to in our simulations. 
However, we use D — d p — 50 ms [IS] and d r — 2 ms [SJ|1|] 
for the Sprint topology. For the AS-level and router-level 
topologies, we use D = d r — 0. 

4.2.1 Failure reaction schemes 

The performance of source routing protocols also de- 
pends on the control plane mechanism: the technique used 
to inform sources about the failures in the network. We 
describe three variants of SlickPackets design with dif- 
ferent control plane mechanisms. We also describe three 

4 Unless packets are duplicated along multiple paths — a 
design point that may be reasonable for certain kinds of 
traffic, but which we do not consider in this paper. 



protocols — one from the SCR paradigm and two from the 
NCR paradigm — that we compare with SlickPackets. 
Flooded-SLICKPACKETS. Upon detecting the link fail- 
ure, ro floods the network with a link state advertisement 
(LSA). This is similar to running an SCR protocol with 
an OSPF |21| style control plane mechanism. 
Fast-SuCKpACKETS. When ro receives a packet whose 
primary next-hop traverses lo, it informs s about the 
link failure by directly sending an ICMP-style notifica- 
tion message to s. The rationale is that, to reduce control 
overhead, only sources that use lo in their primary paths 
need to be notified. Intuitively, this significantly reduces 
the control plane packets sent into the network. 

e2e-SLlCKPACKETS. The router ro piggybacks the link- 
failure information on the packet being forwarded on the 
alternate path towards d, which, upon receiving this in- 
formation, may inform s of the link failure. Thus, failure 
information is sent to the source in an end-to-end manner. 

All SlickPackets schemes use the same FS selection 
algorithm f q3.2|) and incur the delay D between learning 
of the failure and switching to new primary paths. 
Vanilla source routing (VSR). For purposes of com- 
parison with SlickPackets, we evaluate a simple "vanilla" 
source routing protocol. In VSR, each source s specifies 
a single shortest path to its destination d in the packet 
header. For the control plane mechanism, we use the 
"fast" version, where ro directly notifies s. After receiving 
the notification, s incurs the delay D before computing a 
new shortest path. Without a valid path, packets gener- 
ated during this time are queued. Packets that use lo in 
their paths will be dropped by ro after the link failure. 
However, once s has computed a new path, it resends the 
packets that would have been dropped, i.e., those that it 
sent in the time interval [t — R, t) where t is the time s 
learned of the failure, and R is the RTT between s and ro. 
Note that for some of these resent packets, there could be 
two concurrent live copies: the resent copy that will be 
delivered along the new path, and the original copy that 
will be dropped when it reaches ro. This scheme may be 
difficult or undesirable to implement in practice, but as 
an idealized VSR, it is a useful comparison. 
Ideal-SafeGuard. We simulated an idealized version 
of SafeGuard [18| . a network-controlled routing protocol 
that achieves fast failure reaction. SafeGuard uses the 
standard OSPF as the control plane substrate. In Safe- 
Guard, ro immediately uses pre-computed shortest alter- 
nate paths to quickly redirect packets that it would other- 
wise forward along lo. Other routers recognize redirected 
("escort mode") packets and forward them along their in- 
tended alternate paths; however, until they have updated 
their FIBs (after delay D after receiving the LSA), these 
routers continue to forward "normal mode" packets along 
their sub-optimal paths towards lo- In practice, the "al- 
ternative path databases," which are found to be 2 to 8 
times larger than a router's intradomain FIB 18], might 
increase lookup latencies or be an impractical memory 
requirement. However, our ideal version of SafeGuard ig- 
nores these issues. 

Ideal-NCR. This represents an ideal (and unachievable) 
NCR scheme, in which each router learns of a link failure 
in exactly the propagation delay along the shortest path 
from the point of failure to the router; and the router 



instantly begins forwarding packets along the shortest al- 
ternate path. Ideal-NCR is equivalent to a special case 
of Ideal-SafeGuard where all delays, except propagation 
delay, are zero (i.e, D = d r = 0). 

4.2.2 Methodology 

We wrote a static simulator for our evaluation purposes. 
The simulator uses the packet stretch computations de- 
scribed in Appendix [X] Since we are evaluating the re- 
action to single-link failures, we evaluate only (lo,s, d) 
triples where the primary path from s to d uses lo, and 
s and d remain connected after the failure of lo, so that 
at least one alternate path to d exists for each router 
upstream from lo- For the Sprint topology, we evaluate 
all 424,569 possible such triples. For each of the AS- and 
router-level topologies, we sample 1,000 random links and 
use a sampling algorithm (described in Appendix [Cj to 
obtain over 750,000 and 890,000 such triples, respectively. 

In our simulations, the application at the source gener- 
ates packets every 1 ms, starting at time t — ms. For 
the time of link failure to, however, recall that in Ideal- 
SafeGuard, Ideal-NCR, and Flooded-SLICKPACKETS, ro 
floods the LSA when it detects the link failure, not when 
it receives sources' packets. For these schemes, the sooner 
the link fails, the sooner intermediate routers and the 
source learn of the failure and use better paths. So, for 
a fair comparison with non- flooding schemes, we consider 
two extreme points: when to is greater than the network 
diameter in terms of link latencies and when to = 0. The 
former case ensures that by the time to, all sources in all 
evaluated (lo,s, d) triples have had packets reaching ro. 
For the Sprint topology, with a diameter of 139 ms, we 
use to = 150. For the AS- and router-level topologies, we 
assume all links have latencies 1 ms and use to = 50. 

4.2.3 Results 

The high-level results reveal that SlickPackets schemes 
(particularly the Fast and Flooded variants) achieve packet 
stretch comparable to that of NCR scheme Ideal-SafeGuard. 
Although SlickPackets schemes take slightly longer to 
converge compared to SafeGuard, they avoid the high 
packet stretch of Fast- VSR. 

Average stretch. Fig. [6] shows the packet stretch aver- 
aged over all evaluated (lo, s, d) triples when to is greater 
than the network diameter. We first consider features 
common to all schemes. For a given scheme, all packets 
generated early in the simulation have stretch 1. Grad- 
ually, as packets generated closer to to, as well as more 
triples where s is closer to lo, are affected by the fail- 
ure, the average stretch increases. Additionally, for any 
triple, all packets generated after to have stretch no higher 
than those generated at to ; this is reflected in the average 
stretch over all triples. 

We now compare NCR and SlickPackets schemes. In 
NCR schemes, routers upstream from lo, once they receive 
the LSA and update their FIBs, can redirect packets be- 
fore they reach lo; while in SlickPackets schemes, pack- 
ets have to reach lo before being redirected. This differ- 
ence gives NCR schemes only a small advantage for early 
packets, especially for the Sprint topology in Fig. [6ja), 
because upstream routers still incur the delay D between 
receiving the LSA and updating their FIBs. For later 
packets, this advantage becomes more significant as more 
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Figure 6: Average packet stretch - 1 vs. packet generation time when to is greater than the network 
diameter. The y-axes are on log scales. For the Sprint topology, to = 150, D = 50, d r = 2. For the AS- and 
router-level topologies, to = 50, D = d r = 0. 
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Figure 7: Worst packet stretch vs. packet generation time when to is greater than the network diameter. 
The y-axes are on log scales. For the Sprint topology, to = 150, D = 50, d r = 2. For the AS- and router-level 
topologies, to = 50, D = d r = 0. 
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Figure 8: Average packet stretch - 1 vs. packet generation time when to = 0. The y-axes are on log 
scales. For the Sprint topology, D = 50, d r = 2. For the AS- and router-level topologies, D = d r = 0. 
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Figure 9: Worst packet stretch vs. packet generation time when to = 0. The y-axes are on log scales. For 
the Sprint topology, D = 50, d r = 2. For the AS- and router-level topologies, D = d r — 0. 



upstream routers update their FIBs. As expected, Ideal- 
NCR is the best performing scheme in all three topologies: 
it converges 57 ms before Ideal-SafeGuard for the Sprint 
topology (due to D = 50 and d r = 2) and is equivalent to 
Ideal-SafeGuard (not shown) in the other two topologies, 
where D = d r = 0. 

Consider the SlickPackets schemes in Fig. [SJa) . We 
see that for packets generated between to = 150 and 
to + D — 200, the average packet stretch is (1) con- 
stant within the same scheme and (2) identical across all 
schemes. Recall that all SlickPackets schemes use the 
same FS selection algorithm and incur the same delay D 
between learning of the failure and switching to new pri- 
mary paths. Thus, the only factor affecting their relative 
performances is the time s learns of the failure, which is 
determined by the relative distances among lo, s, and d for 
different triples in the same scheme, and the different con- 
trol schemes given the same triple. So, regardless of the 
(lo, s, d) triple or the control scheme, there is a minimum 
window of D time where s uses the same (old) primary 
path. After this window, we can see that Fast-SLICKPAC- 
kets converges slightly faster than Flooded-SLICKPAC- 
kets because the LSAs in Flooded-SLICKPACKETS incur 
delay d r at intermediate routers; in Fig. Ob) and (c), 
where d r = 0, Fast- and Flooded-SLICKPACKETS are iden- 
tical. And both of them converge significantly faster than 
e2e-SLlCKPACKETS as expected. 

Finally, we see that in Fast- VSR, early packets expe- 
rience higher stretch than in other schemes. This is be- 
cause these packets are dropped and have to be resent 
by s. They experience on average a delay of one half the 
RTT between s and ro, plus the delay D before being sent 
along the new path, resulting in a high stretch. However, 
Fast- VSR can catch up to and overtake Fast-SLICKPAC- 
kets for two reasons. First, consider the packet sent 1 ms 
before s learns of the failure: in Fast- VSR, it is delayed 
(1 + D) ms before being resent along the new path; while 
in Fast-SLICKPACKETS, the amount of time this packet 
traverses the original primary path only to be redirected 
backwards can be larger than (1 + D), especially if both 
the primary path and alternate path contain a very high 
latency link. Second, consider the packet generated 1 ms 
before s has a new primary path: in Fast- VSR, it is de- 
layed (queued) only 1 ms before being sent on the new 
optimal path; while in Fast- S lickPackets , this packet 
will be sent along the original primary path and will be 
redirected, experiencing a higher stretch than its Fast- 
VSR counterpart. These two effects enable Fast- VSR to 
noticeably overtake Fast- S lickPackets in Fig. Oa), but 
in Fig. Ob) and (c), where D = and all links have la- 
tencies 1 ms, these two effects are less pronounced. 
Worst stretch. Fig.[7]shows the worst stretch of packets 
given their generation time, among all evaluated (lo, s, d) 
triples, when to is greater than the network diameter. 
Note that the simulation-wide worst stretches for all schemes 
except Fast- VSR are equal, which are 2.93, 2.0, and 2.2 
in Fig.JTJa), (b), and (c), respectively. This is because all 
these schemes do not drop packets, so the worst stretch 
is that of packets that ro redirects, which is the same for 
all these schemes. Also note that for schemes that do 
not drop or queue packets, the worst stretch occurs when 
a packet traverses the maximum possible distance along 
the original shortest path without reaching d, is redirected 



back to s, and traverses the shortest alternate path. So, 3 
is the upper-bound stretch because the shortest alternate 
path cannot be shorter than the original shortest path. 

For the Sprint topology in Fig. [7£a), the simulation- 
wide worst stretch for Fast-VSR is 27. This happens to 
packets sent right before to = 150 in triples where s is 
close to d, so that the time duration D that these packets 
are delayed dominates the latencies of the original and 
post-link-failure shortest paths. In the AS- and router- 
level topologies, where D = 0, the simulation-wide worst 
stretch of Fast- VSR are 2.75 and 2.88 respectively. 
When to = 0. Fig. [H] and [§] show the results for 
when to = 0. The overall behavior of each individual 
scheme exhibits similar patterns to when to is greater than 
the network diameter. The differences are that the peak 
stretches occur for packets generated at to = 0. Further- 
more, as expected, flooding schemes benefit from the ear- 
lier time of failure: for example, for the Sprint topology in 
Fig. [HJa), Ideal-NCR and Ideal-SafeGuard converge fur- 
ther ahead of Fast- S lickPackets compared to Fig.^a), 
and even Flooded-SLICKPACKETS now converges ahead 
of Fast-SLICKPACKETS (similarly for the AS- and router- 
level topologies). 

In terms of simulation-wide worst stretch, those of non- 
flooding schemes (Fast- and e2e-SLlCKPACKETS as well as 
Fast- VSR) are the same as when to is greater than the 
network diameter. This is as expected because for these 
schemes, it is still ro that redirects packets and/or triggers 
the notification of sources. For flooding schemes, however, 
it can be expected that simulation-wide worst stretch 
would be lower compared to when to is greater than the 
network diameter. Nevertheless, the Sprint topology con- 
tains triples where an upstream link that is close to ro 
has very high latency compared to the distance between 
s and ro, so that s's first packet does not benefit from 
the flooded LSA: it still has to reach ro before being redi- 
rected. This results in the simulation-wide worst stretch 
of 2.93 in Fig. E^a). 

5. DISCUSSION: FORWARDING 
SUBGRAPH SELECTION 

The SlickPackets design is agnostic to how the source 
selects the forwarding subgraph (FS). For example, the 
FS selection may be guided by demands of the applica- 
tion running at the source (for example, if the source is 
an end host) or the performance goals of a network op- 
erator (for example, if the source is an edge router). In 
this paper, we presented and evaluated one such FS selec- 
tion algorithm: where the FS allows re-routing of packets 
within the network in case of single-link failures. We now 
discuss alternative FS selection strategies. 
Handling node failures. For the FS to handle node 
failures, we need only a simple modification to the link- 
failure-avoiding FS selection of H3.2I A source s, for a 
given destination d, constructs the FS in three steps. 
First, s computes a primary path P to d by running an 
instance of the shortest path algorithm. Next, to protect 
against single node failures, s visits each node along P, 
and computes the alternate path Pi it would prefer the 
packet to be routed along if that node were to fail. In par- 
ticular, for each node Vi on the primary path with node 
Vi+i as the next hop along the primary path, we (a) re- 



move Vi+i; (b) compute a shortest path from Vi to d; and, 
(c) restore Vi+i. 

Handling multiple link failures. A source may desire 
to construct an FS that protects against multiple link 
failures. This may be done by extending the scheme from 
[|3] to construct an FS that protects from multiple edge- 
failures. For example, it may be sufficient to have two 
strategically chosen alternate paths for all nodes on the 
primary path. The idea is that the source can choose 
alternate paths that are not failure-correlated with the 
primary path. This may allow a much larger amount of 
resiliency; although the performance evaluation of such a 
scheme is subject to future work. 

Congestion avoidance. Our focus in this work so 
far has been on dealing with failures. However, alternate 
paths in the FS may also be used to react to congestion in 
the network. For example, intermediate routers along the 
path may choose to forward the packet along an alternate 
path if the primary path is congested (e.g., if the inter- 
face queue for the corresponding link is filled beyond a 
particular threshold). Using a FS also enables the source 
to optionally provide control over load balancing, by pro- 
viding feedback on which set of paths are tolerable for the 
load balancing process. 

6. RELATED WORK 

Our goals are related to two key areas of related work: 
Failure reaction in network-controlled routing pro- 
tocols. There has been much work on coping with 
failures in IP networks. We focus on the most closely 
related work: protocols that guarantee packet delivery 
in the presence of one or more link failures. R-BGP [IS] 
constructs interdomain backup paths to handle single link 
failures, given some assumptions about routing policies. 
SafeGuard [18] uses a remaining path cost field in a packet 
as a heuristic to determine whether the path expected by 
the previous hop is different than the path available to the 
current hop. In this way, it can decide when to reroute 
packets along pre-computed backup paths. FCP [17] takes 
a different approach to determining when packets should 
be rerouted: each packet carries a list of the failed links 
it has encountered. The best backup paths are com- 
puted on the fly at routers, thus allowing FCP to be ro- 
bust to multiple link failures, but requiring fairly heavy- 
weight graph processing in the data plane. MPLS Fast 
Reroute [22] relies on precomputation of backup paths. In 
its local repair variant, an additional path is constructed 
to avoid each neighboring link or node, which can inflate 
storage requirements and will not result in lowest-stretch 
backup paths. As discussed in the introduction, all of 
the above approaches are NCR protocols, which do not 
permit source control of primary or backup paths. In 
addition, backup paths are computed or stored at every 
router within the network, so that there is a dependency 
between each router's forwarding table and the topology 
of the entire network. 

One way to get a small amount of route control at the 
source within an NCR architecture is to use multihoming: 
the source can then select between several providers [4]. 
This could be used to enable some source control, while 
still applying the NCR resilience techniques described above. 
However, this provides only a very limited amount of con- 



trol to the source, and does not yield the full benefits of 
source control described in the introduction. Moreover, if 
many sources are multihomed, this vastly increases rout- 
ing state within the network, since each router would be 
required to know about every point of multihoming at- 
tachment if we desire to provide alternate paths that avoid 
a failure of one of these links. 

Our use of routing along FSes was inspired by [19] . 
which argues that a directed acyclic graph is a better for- 
warding architecture than the more traditional shortest- 
path tree. While [19] focuses on improving NCR schemes, 
we target achieving the benefits of both network- and 
source-controlled routing. Additionally, while [19] will 
deliver every packet even during link failures, it does not 
guarantee the latency that these packets will have. Slick- 
Packets can guarantee that for single- link failures, pack- 
ets will follow the shortest alternate path from the point 
of failure to the destination. 

Source routing. There is also a large body of work on 
source controlled routing, ranging from dynamic source 
routing in wireless networks [15] to future interdomain 
routing architectures 10,20,29,30 . Two of these, Rout- 
ing Deflections [30] and Path Splicing [20], target fast 
re-routing within the network. Both use path label bits 
set by the source to pseudorandomly select a next hop at 
each router or AS. In [20], pseudorandom forwarding can 
lead to forwarding loops. In 30 routers follow certain 
rules that ensure loop-freedom, but reduce path diversity. 

There are three important differences between [201130] 
and SlickPackets. First, 21) 30 do not fully support 
source control over primary or backup routes; although 
sources can select among some set of paths, they can- 
not tell which paths they are selecting. Second, although 
packets can be rerouted quickly within the network af- 
ter a link failure, this is not guaranteed (packets may be 
dropped), and the backup paths are not guaranteed to 
have optimal latency. Third, [201130] are similar to tradi- 
tional NCR schemes in terms of the state in the network; 
indeed, [2D] increases forwarding table size because each 
router stores multiple next-hops for each destination. In 
contrast, SlickPackets enables source control, can guar- 
antee resilience^ to single-link failures with packets sent 
along the shortest alternate path from the point of failure 
to the destination, and requires only local state at routers. 

Giving sources control over constructing end-to-end paths 
introduces a number of practical questions, for example 
in terms of policy compliance, security, and scalability of 
disseminating topological state. For these questions, we 
rely on previous work (e.g., [10I29| . and citations within), 
which provide solutions to these problems. 

7. CONCLUSION 

In this paper, we presented SlickPackets, an approach 
to routing that attains failure reaction, while simultane- 
ously retaining the benefits of source routing. Slick- 
Packets works by compactly encoding a set of alternate 
paths into data packet headers as a directed acyclic graph. 
Towards this goal, we provide simple algorithms for com- 
puting efficient graphs, and for encoding them into pack- 
ets in a manner that can be processed by intermediate 
routers in an efficient manner. 

5 Unless, of course, no alternate path exists. 



One major area left for future work is to evaluate the 
complexity of implementing SlickPackets in production 
routers, and achievable packet forwarding rates; a key 
challenge here is dealing with increased header size. A 
promising avenue for evaluation is the Supercharged Plan- 
etLab Platform [25], a network processor-based platform 
on which John DeHart has implemented a prototype ver- 
sion of SlickPackets. 

This work was supported by National Science Founda- 
tion grant CNS 10-40396. 
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APPENDIX 

A. COMPUTING PACKET STRETCH 

We describe here the stretch computations our simula- 
tor in i)4.2.2l uses. Given a source-destination pair s, d and 
a failed link Iq on the primary path between s and d, we 
wish to compute the stretch experienced by each packet 
that the application at source s generates. We assume 
the application at s generates packets every 1ms, start- 
ing at time t = 0. Further, we assume that all nodes in 
the network have sufficient queue space so that no packet 
is dropped for lack of queue space, and that the nodes 
can fully flush their queues/buffers instantaneously. We 
assume links have sufficient capacity and devices have suf- 
ficient data-plane processing capabilities, so that they do 
not introduce delays to data packets. 

First, we give an overview of our approach (notation 
summarized in Table [TJ. For a given (lo,s,d) triple in 
Fig. 1101 consider a router r on the primary path that is 
upstream from lo- After the failure of lo, router r can 
"offer" two types of stretch to packets that reach it: (a) 
to packets that r redirects along its alternate path to d, a 
fixed stretch cr(r); (b) to packets that r forwards along its 
primary path, whatever stretch offered by downstream 
routers on the primary path. The two important fea- 
tures of a router r are then the time r(r) at which it 
starts redirecting packets along its alternate path, and 



Notation 


Description 


lo 


a failed link on the primary path from source s to destination d 


ro 


the node upstream from and adjacent to lo (ro might be source s itself) 


to 


the time of failure of link io 


ilearn (^) 


the time a node u learns of the failure, by detecting it or being notihed (ti earn [u) > to) 


t(u) 


the time a node u can start sending packets along the post-failure shortest path from u to d 


a(u) 


the stretch experienced by packets that a node u sends along the post-failure shortest path from u to d 


dist(u, v) 


the shortest path latency between nodes u and v with no link failure 


dist(u, v, I) 


the shortest path latency between nodes u and v after the failure of link / 


h(it, v) 


the hop-count between nodes u and v with no link failure 



Table 1: Notation used in the packet stretch computations 



the fixed stretch a(r) it offers such redirected packets. 
Then, given r(r), we can compute the sent time of the 
first packet from s that will be redirected by r; packets 
sent before this packet will be forwarded to r's primary 
next-hop and thus experience whatever stretch offered by 
r's downstream routers. Applying the above analysis to 
all routers upstream from io, our simulator determines 
the stretch experienced by any packet given the time it is 
generated by the source application. 



0— 



<D 



Figure 10: A primary path from s to d with failed 
link lo. ro is the router upstream from and adja- 
cent to Iq. 



A.l SLICKPACKETS 

With all SlickPackets variants, for a particular (lo, s, d' 
triple, the only two nodes that potentially redirect pack- 
ets are ro and source s, so we only need to analyse these 
two nodes. 

Consider ro. It starts redirecting packets at time 
r(ro) = to- The first packet from s that will be redi- 
rected by ro arrives at ro at time maxjto, dist(s, ro)} and 
thus is sent/generated by s at time 



max{to, dist(s, ro)} — dist(s, ro) 



(1) 



This packet and all packets redirected by ro experience 
its fixed stretch, which is 



cr(ro) = 



dist(s, r ) + dist(r , d, l ) 



dist(s, d, lo) 
Consider s. It is easy to see that 
a(s) = 1 



(2) 



(3) 



To compute the time that s first "redirects" packets — 
sends them along its new primary path to d — note that 
for all SlickPackets variants, this time is 

r(s) = ti earn (s) + D 

Now, we show the derivation of ti earn (s) for each Slick- 
Packets variant. 

Flooded-SLICKPACKETS. At time to, when it detects 
the link failure — not when it receives packets from source 



s — the router ro floods the network with the LSA. Thus, 
s receives the LSA and learns of the link failure at time 



tlearn(s) = t + dist(r , s) + h(r , S) • d r 



(4) 



Fast-SLICKPACKETS. Upon receiving a packet from s 
that requires redirection, ro sends a notification message 
to s about the failed link. Thus, s receives the notification 
message and learns of the link failure at time 

tieam(s) = max{t , dist(s, r )} + dist(s, ro) (5) 

e2e-SLlCKPACKETS. Upon receiving a packet from s 
that requires redirection, ro piggybacks the notification 
message in the packet on the way to d, which receives the 
packet at time max{io, dist(s, ro)} + dist(ro, d, lo). After 
d computes its new shortest path to s, it sends the no- 
tification message to s. Thus, s receives the notification 
message and learns of the link failure at time 

tieam{s) = max{t , dist(s, r )} + dist(r , d, l ) 

+ D + dist(d,s,/ ) (6) 

A.2 Ideal-SafeGuard 

At time to , when it detects the link failure — not when it 
receives packets from s — the router ro floods the network 
with the LSA. As routers upstream from ro learn about 
the link failure, they can potentially redirect packets along 
shorter paths, reducing the amount of time packets tra- 
verse a sub-optimal path, i.e., packets do not have to reach 
the failure before being redirected. 

Consider ro. It learns of the link failure at time to. 
Because it has already pre-computed alternate paths to 
d, it can immediately redirect packets along the alternate 
path. Thus the first packet from s that will be redirected 
by ro arrives at ro at time max{to, dist(s,ro)} and thus is 
sent/generated by s at time 



c{to, dist (s,r )} — dist(s,r ) 



(7) 



This packet and all packets redirected by ro experience 
its fixed stretch 

dist(s,r ) + dist(r ,d, / ) /oN 

<y(ro) = (o) 

dist(s, d, lo) 

Consider a node r upstream from ro. (r can be a 

router or source a.) It learns of the link failure at time 

tieam(r) = t + dist(r , r) + h(r , r) • d r 



Because it is not adjacent to the failed link, r only starts 
redirecting packets along its alternate path to d at time 
tieam(r) + D. Thus, the first packet from s that will be 
redirected by r along the alternate path is sent by s at 
time 



stretch 



max{0, ttearn(r) + D- dist(s, r)} 



(9) 



This packet and all packets redirected by r experience its 
fixed stretch 



ff(r) 



dist(s, r) + dist(r, d, lo) 
dist(s, d, lo) 



(10) 



Next, consider two adjacent routers r and r' upstream 
from lo where r' ro and r is upstream from r' . Let t and 
t' , given by Eq. [9] be the sent times of the first packets 
from s that are redirected by r and r', respectively. Note 
that dist(ro,r) > dist(ro, r'), h(ro,r) > h(ro,r'), and 
dist(s,r) < dist(s,r'); thus t > t' (t = t' ^> t = t' = 0). If 
t > t', packets sent by s in the interval [t' , t) are redirected 
by r' and thus experience the stretch offered by r'. 

A.3 Fast-VSR 

Unlike SlickPackets and SafeGuard, with VSR, the 
router ro drops instead of redirecting packets; source s 
has to resend those dropped packets. Also, the only node 
that "redirects" packets is source s — it sends them along 
a new primary path. Furthermore, s queues packets gen- 
erated by the application between the times it learns of 
the failure and is ready to use a new path to d. Thus, 
we only consider source s, but we consider two types of 
packets: those that are dropped and resent, and those 
that are queued. 

The first packet from s that will arrive at ro after the 
link failure — thus will be dropped by ro and later resent by 
s — arrives at time max{io, dist(s, ro)} (and thus is origi- 
nally sent by s at time max{to, dist(s, ro)} — dist(s, ro)). 
Upon receiving this packet, ro sends a notification mes- 
sage to s, which s receives at time 

tieam(s) = max{t , dist(s, r )} + dist(s, ro) 

Thus, s is ready to use a new primary path to d at time 

t(s) = max{to, dist(s, ro)} + dist(s, ro) + D 

At this time, s instantaneously resends all packets that 
would have been dropped. These packets were originally 
sent at time 



maxjfo, dist(s, ro)} — dist(s, ro) 



(11) 



with A g [0,2-dist(s,r )) (recall "\t-R,t)" in 34~2T|) and 
thus experience stretch 



2 ■ dist(s, r ) - A + D + dist(s, d, l ) 
dist(s, d, lo) 



(12) 



For the queued packets, after learning of the failure, s 
queues all packets newly generated by the source applica- 
tion until it is ready to use a new primary path. These 
packets are generated between ti earn (s) and r(s). In other 
words, they are generated at time 



^iearn(^) "T" A 



(13) 



with A S [0, D] . Since s also instantaneously sends all 
these queued packets at r(s), these packets experience 



D - A + dist(s,d, l ) 
dist(s, d, lo) 



(14) 



Note that the packet generated at time ti ear n{s) + D has 
stretch 1, as is expected. 

B. DIRECT ENCODING 

Direct Encoding embeds the Forwarding Subgraph (FS) 
as a directed acyclic graph data structure in the packet 
header. At a high level, each router in the FS — except 
the destination router because it has no outgoing links — 
is encoded exactly once in a structure we call the Nod- 
eDescriptor (ND), at some location (bit offset) within 
the encoding. A router's NodeDescriptor (ND) con- 
tains SuccessorDescriptor (SD) structures, which rep- 
resent the router's next-hop successor(s). A Successor- 
Descriptor (SD) contains (1) the router's locally unique 
link identifier for the next-hop successor and (2) the offset 
pointer to the successor's ND. Finally, the packet header 
contains a "current node offset pointer". A router reads 
this pointer to locate its ND, and updates it to point to 
the next hop's offset pointer before forwarding. 

We use the overall format: 



NodePtrLength 


CurrentNodePtr 


NDx 


ND-z 




ND k \ 



NodePtrLength A prefix code that indicates the length 
in bits of the CurrentNodePtr field and all other 
absolute node pointers. The mappings are 0, 10, 110, 
and 1110, for 10, 8, 6, and 4 bits respectively. 

CurrentNodePtr This value specifies the bit offset (from 
the beginning of the encoding) of the current router's 
ND. The value of zero has a special meaning: the 
current router is the final destination/egress router. 

An ND can have either one or two SD's, with the con- 
vention that the first successor is the primary one. The 
ND has the following format: 



NumberOfSuccessors 



SD! 



SD 2 \ 



NumberOfSuccessors (1 bit) indicates there is one 
successor, and 1 indicates there are two successors. 

The SD contains two main pieces of information: the 
next-hop identifier and the offset pointer to its ND. For 
the next-hop identifier, similar to the encoding scheme 
discussed in N3.3I we use the router's locally unique link 
identifiers, which it advertises as part of the network map 
dissemination. For the offset to the next-hop's ND, we use 
a 1-bit flag to indicate that the next-hop's ND immedi- 
ately follows the current ND; otherwise, we include an 
absolute offset pointer to the next-hop's ND. Here is the 
SD format: 



Linkld 



ContainsPtr? 



Ptr 



Linkld The identifier of the link to forward the packet. 
The length of this field is specified by the router as 
part of the map dissemination. In our encoding size 
evaluation f £|4.1[) . we assume that it is [~log 2 A] bits. 

ContainsPtr? (1 bit) indicates that the next-hop's ND 
follows immediately after the current router's ND. 

Ptr Pointer to the next-hop's ND. The Length of this 
field is specified by NodePtrLength discussed above. 

The 1-bit flag is only an optimization that allows us to 
leave off the offset to the next-hop's ND. To make this 
optimization useful, the encoding algorithm first encodes 
all nodes on the primary path one after another. The first 
(primary) SD of each of these uses the 1-bit flag because 
the successor's ND immediately follows its own descriptor 
(except in the penultimate router's case, which uses an 
absolute pointer value of zero). The second SD, if any, 
uses the absolute offset pointer. 

After encoding all nodes on the primary path, the en- 
coding algorithm picks one of the alternate paths and en- 
codes all of its yet-to-be-encoded nodes one after another. 
These nodes that are encoded contiguously can use the 
relative pointer for their SD's, and when a node's next- 
hop successor is an already-encoded node, then the next- 
hop successor's offset pointer is used. Also, the penulti- 
mate router uses an absolute pointer value of zero in its 
SD. 

Forwarding Algorithm. Upon receiving a packet, 
the router first gets the value of the CurrentNodePtr 
(after parsing NodePtrLength). If the value is zero, 
then the router is the egress router, and it can perform 
appropriate actions on the packet (e.g., delivering it on 
attached networks), and it does not forward the packet 
further. 

If CurrentNodePtr has a non-zero offset value, then 
the router parses its ND at that offset. Note, the router 
expects that the lengths of its Linkld fields are what it 
advertised (e.g., [~log 2 A] bits). With that information, 
the router can fully parse its ND. If the link labeled in 
the first SD is online, then the router will use that link 
to forward the packet. Otherwise, if there is a second SD 
and its link is online, then the router will use that link 
to forward the packet. Otherwise, the router drops the 
packet. 

Before forwarding the packet, the router needs to up- 
date the CurrentNodePtr. If the used SD contains an 
absolute offset pointer (i.e., its ContainsPtr? flag is 
1), then the router updates CurrentNodePtr with the 
value in the SD's Ptr: 

CurrentNodePtr «- SD.Ptr 

Otherwise, the successor's ND follows immediately after 
the current router's ND, so to obtain the successor's ND 
offset, the router adds the total length of its own ND 
to its own (CurrentNodePtr) offset, and then updates 
CurrentNodePtr with that value: 

CurrentNodePtr «- \ND\ + CurrentNodePtr 

C. SAMPLING ALGORITHM FOR SIM- 
ULATION 

For each sampled link lo, we evaluate "qualified sources": 
those whose shortest path tree (SPT) includes Iq. To find 



qualified sources, we sample up to 2,000 random sources 
and use the first 100 qualified sources, or fewer if we find 
fewer qualified sources. For each qualified source s, we 
randomly sample 100 destinations from among all those 
on the subtree of s's SPT that uses Zo- Finally, among 
the sampled destinations, we use only those that remain 
connected with s after removing Iq. 

D. LOWER BOUND ON EDGE-SET SIZE 

How much can we reduce the size of the FS by designing 
more sophisticated algorithms for selecting the FS? How 
close are the results given in 32 to the smallest possible 
header for handling single link failures? 

In order to be able to answer the above questions, we 
derived lower bounds on the edge-set size of FSs that pro- 
vide fast failure reaction against single link failures. That 
is, for any FS that uses the shortest path between the 
source and the destination as the primary path, the lower 
bound gives the minimum number of edges that the FS 
must contain in order to provide an alternate path avoid- 
ing any single-link failure on the primary path. For any 
source-destination pair s,d, the lower bound is given as 
follows: 

2\P(s, d) \ + 1 if graph weighted 

' f5]PM|l . . , . , 

— ■ — ^ — — it graph unweighted 

where P(s,d) is the primary path and \P(s,d)\ the num- 
ber of edges in P(s,d). These lower bounds impose a 
fundamental limitation on the header size of SlickPac- 
KETS; intuitively, it is hard to reduce the header size (in 
bytes) significantly without reducing the edge-set size of 
the resulting FS. We prove the lower bound below. 

Note that a trivial lower bound on the size of the FS 
is 2|P(s, d)| because each node in the primary must have 
two outgoing edges in order to provide fast failure reac- 
tion against single link failures. Theorem [T] essentially 
states this bound along with an example graph demon- 
strating that the bound is tight. However, if the graph 
is unweighted (all edges have the same weight), we can 
provide a better bound: intuitively, the alternate paths 
must include extra edges in order to ensure that they are 
at least as long as the primary (which is by definition the 
shortest). We give this improved bound in Theorem [5] 
We assume, in the following proofs, that the graph is not 
a multigraph and is 2-connected. 

Theorem 1. Suppose the FS uses the shortest path P(s, d) 
as the primary path and can avoid any single link fail- 
ure along the primary path. Then the FS has at least 
2[P(s, rf)|+l edges. Moreover, there exist graphs for which 
this bound is tight. 

Proof. For weighted graphs, we note that FS contains 
\P(s, d)\ edges along the shortest path. Furthermore, each 
node along the shortest path requires at least one addi- 
tional outgoing/incoming edge in order to provide fast 
reaction against single link failures. The proof follows by 
noting that there are exactly \P(s, d)\ + 1 nodes along the 
shortest path. To prove tightness of the bound, we use 
the graph shown in Fig. 1111 □ 

Before going to the lower bound proof for unweighted 
graphs, we give some definitions to make the discussion 



u 




S = Vo Vl V2 Vk-2 Vk-1 d = Vk 

Figure 11: A graph that achieves the lower bound 
on the size of the FS for weighted graphs. The 
weight of edges (vi,Vi+\) are all 1; the weight of 
edges (vi,u) is set to k — %. 

more succinct. Let G = (V, E) be the graph and given a 
pair of vertices s,d, let FS be the optimal FS, meaning 
it has the minimum possible number of edges while satis- 
fying the conditions in the theorem. Denote the shortest 
path between s and d as 

P = P{s,d) = (s = V ,Vl, . . . ,V k -2,Vk-l,V k = d) 

and let \P\ be the number of edges in P. Let G' = (V, E') 
be a densest graph (with maximum possible number of 
edges) such that P is also the shortest between s and 
d in G' and let FS' be the optimal forwarding subgraph 
between s and d in G' . Let \FS\ and \FS'\ be the edge-set 
size of the optimal forwarding subgraphs FS and FS'. Let 
Q(u, v) denote the shortest alternate path (as computed 
in Q between any pair of nodes u and v and N(u) be the 
set of neighbors of any node u. 

Theorem 2. Under the same conditions as Theorem^ 
except that edges have equal weights, the number of edges 
in the FS is lower bounded by: 

' 5\P(s,d)\ - 
2 

Moreover, there exist graphs for which the bound is tight. 

Proof. We start with a few simple observations: first, 
since E C E' , we have that \FS\ > \FS'\. Hence, a lower 
bound on \FS'\ implies a lower bound on \FS\. Second, 
since G' is unweighted, there is no edge between Vi and 
Vk for any i < k — 1 otherwise P cannot be the shortest 
path. Furthermore, to provide fast failure reaction against 
single link failures, FS' must contain an edge («,«&) for 
some u $5 P since (vk-i,Vk) G P and the graph is not a 
multigraph. 

Consider nodes u and Vk-i- Since the graph is not a 
multigraph, we have that \Q(vk-i,Vk)\ > 2. Hence, we 
can replace Q(wfc_i,«&) by (vik-i, u, v k ) without increas- 
ing l-FiS'l; the edge (u,Vk) indeed exists as argued earlier. 
Now, consider node Vk-2- Note that \Q(vk-2, Vk)\ > 2 
and hence, we can replace Q(vk-a,Vk) by (vk-2,u,Vk) as 
earlier. 

We make a final observation: let us denote by FS'\P 
the set of nodes that are in the FS' but not in P. We claim 
that for any node q G FS'\P, N(q) n P < 3. To prove 
this, suppose by way of contradiction that N(q) HP > 4. 
Then at least two of the nodes in N(q) are at distance at 
least 3 along P, while they are connected via q by just two 
hops, contradicting the fact that P is the shortest path. 

To summarize, we have shown that for any node q £ 
FS'\P, we have that N(q) n P < 3. We have also proved 



that in the (new) optimal FS, v k -2 and vt-i are con- 
nected to a node u that has a direct link to d. Note that 
u G FS'\P and it is already connected to three nodes 
in P. Hence, in the (new) optimal FS, we have that ev- 
ery node Vi G P, i < k — 2 must find an alternate path 
to at least one of the nodes Vk-2,Vk-i,Vk or q and can- 
not have a direct (alternate) edge to any of these nodes. 
We create a new graph G" by collapsing the four vertices 
(vk-2, Vk-i, Vk, q) (call this new node d'); to compute an 
optimal FS on G' , we can compute an optimal FS on 
G" and combine it with the edges between these nodes 
that form a part of FS' . Hence, we have reduced our 
problem to a strictly smaller subproblem with the same 
constraints. This allows us to use a simple recursion. Let 
S(n) denote the edge-set size of the optimal FS in G' with 
\P\ = n. Then, we get the following recursion: 

S(n) > S(n-2) + 5 

which gives us the claimed lower bound on the edge-set 
size of the FS. To prove that the bound is indeed tight, 
we use the graph shown in Fig. [T2j □ 




Figure 12: A graph that achieves the lower bound 
on the size of the FS for unweighted graphs. 



